IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization by cpegeric · Pull Request #25095 · matrixorigin/matrixone

cpegeric · 2026-06-23T14:42:27Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

IVFFLAT support bf16, float16, int8 and uint8 quantization
bug fix GPU concurrent Kmeans clustering
IVFPQ/CAGRA support float32, float16 as base type and its quantization int8 and uint8
bug fix slow int8 quantization in GPU. moved to CPU computation
basic array function for bf16, float16, int8, and uint8

cuvs CAGRA needs at least intermediate_graph_degree rows per sub-index (default 128); IVF-PQ k-means needs at least `lists` rows. When the source has a partial trailing chunk (`total % IndexCapacity`) below the cuvs minimum — or the whole dataset is too small — the build would error. Pre-count source rows up front, compute cdcCutoff via the formula cdcCutoff = total - lastChunkSize when lastChunkSize < threshold = total otherwise Rows < cdcCutoff still feed the cuvs builder as today; the trailing rows buffer into a per-(table, index) PendingRecord slice and end() emits them as tag=1 CDC records under vectorindex.CdcTailId via the new cuvs.SaveSmallTailAsCdc helper. Search-side brute-force replay already serves tag=1 records when no tag=0 model exists for that slice, so queries keep working until a future rebuild lifts the tail back above threshold. Empty source is now a clean no-op (was: "source table is empty; cannot determine index capacity" error) — the auto-detect / cutoff branch sets srcEmpty=true and per-row / end() short-circuit. The CDC bytes layout reuses the existing cuvscdc.EncodeEventRecord + FrameCdcChunk + CdcAppendEventsSql primitives so replay decodes identically. INCLUDE-column bytes are produced by a new encodeIncludeRowFromArgVecs sibling next to appendFilterRow, matching the cuvscdc.EncodeIncludeRow on-wire layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The CAGRA / IVF-PQ idxcron Hooks are thin wrappers that delegate to cuvsidxcron.CuvsUpdatable with a per-algo CuvsUpdatableSpec. Add focused tests that drive each wrapper through the IndexDef-missing and threshold-missing paths of the shared body — the error message and skip reason name the storage-table-type and threshold-param the spec asked about, so a regression to the wrong constant surfaces immediately. HNSW and fulltext don't participate in scheduled rebuilds; cover their trivial-true contract too so any future wiring keeps the "don't surprise-skip" guarantee. IVF-FLAT's full nsample/lists body suite already exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When the small-tail fallback writes all source rows to CDC tag=1 records without producing a tag=0 sub-index, the search-side loadCdcTail used to short-circuit ("cdc_tail data is moot without a main index") and ignore those records. Filtered queries against small-data-only indexes therefore returned empty results. Persist the INCLUDE-column layout in a self-describing record at the start of chunk_id=0: CdcOpHeader (1) | payload_len (uint32 LE) | colMetaJSON SaveSmallTailAsCdc prepends this header when colMetaJSON is non-empty (computed via the new colMetaJSONFromCols helper from the table-function's resolved []cuvsfilter.ColumnMeta). The header's self-describing length lets DecodeEventRecord skip past it without knowing includeBytesPerRow, and PeekColMetaJSON recovers the JSON without committing to dim/ibpr. CagraSearch.loadCdcTail and IvfpqSearch.loadCdcTail no longer return early when no sub-index has loaded. They peek the header, derive includeBytesPerRow via cuvscdc.CdcIncludeBytesPerRow, replay the tag=1 events into a synthetic model, and stash the colMetaJSON on a new OverflowColMetaJSON field. buildOverflow falls back to that field when no main-index has a GetFilterColMetaJSON() to offer — so the brute-force FilterStore gets wired with INCLUDE-column metadata and filtered prefilter still works on small-data-only indexes. ReplayEventLog also captures the header into ReplayState.ColMetaJSON for callers that prefer the unified result struct over the peek helper. Empty-result invariant preserved: a header-only chunk with no event records produces no overflow → buildOverflow leaves s.Overflow nil → buildMultiIndex returns nil → Search returns []int64{}, []float64{}, nil. Both buildMultiIndex docstrings call out that this is the load-bearing path for "no main index + no brute-force → empty result" and that TestCagraSearchEmpty / TestIvfpqSearchEmpty pin it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the CdcOpHeader record introduced in f28f73d with a dedicated header section in every chunk's frame. Frame format bumped to version 2: magic_start | version | payload_len | header_len | header | records | crc | reserved | reserved | magic_end The header section carries colMetaJSON when the index has INCLUDE columns; payload_len covers only the event records (Delete/Insert, unchanged shape). header_len = 0 collapses the new section to nothing, matching the original 32-byte overhead. Why the shape change: - Records stay pure event payloads — no CdcOpHeader op, no special- case in DecodeEventRecord / ReplayEventLog. Decoders treat headers as frame metadata, not as records to skip. - Every chunk is self-describing: any one chunk read in isolation knows its INCLUDE-column layout without depending on chunk_id ordering or whether chunk_id=0 is present. - Fixes the empty-source-then-CDC edge case: when cagra_create with srcEmpty=true emits nothing, the first CagraSync.Save chunk (chunk_id=0, NextChunkIdSql) carries the header so search can decode it. Surface changes: - FrameCdcChunk(records, header []byte) — new second arg. - UnframeCdcChunk returns (records, header, err). - CdcAppendEventsSql(..., colMetaJSON string) — embeds the header in every emitted chunk. - SaveSmallTailAsCdc just passes colMetaJSON through; no longer prepends a header record. - CagraSync.Save / IvfpqSync.Save pass s.colMetaJSON to CdcAppendEventsSql so ongoing CDC iterations also embed it. - ReplayEventLog captures the header from each chunk's frame into ReplayState.ColMetaJSON (last-write-wins; in practice all chunks share the same value). - PeekColMetaJSON simplifies to "unframe chunks[0], return header". - CdcOpHeader / EncodeHeaderRecord / CdcEventRecord.Header dropped. Tests updated: existing FrameCdcChunk / UnframeCdcChunk callers take the new signature; the old "header as first record" small-tail tests are replaced by ones that assert the header lives in every chunk's frame. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both producers (cuvscdc.ResolveIncludeColumns and the table-function helper colMetaJSONFromCols) now share one entry type and one marshal function: cuvscdc.ColMetaEntry{Name, Type} cuvscdc.MarshalColMetaJSON([]ColMetaEntry) (string, error) The shared producer uses encoding/json so column names containing `"` or `\` (or any other JSON-significant character) escape correctly — the previous strings.Builder paths would have emitted invalid JSON for such names. New TestMarshalColMetaJSON_EscapesNames pins that contract by round-tripping a name containing each special character through encoding/json. Single producer also guarantees the iscp writer side (ResolveIncludeColumns at index-CDC-event-write time) and the table- function side (small-tail emit at build time) cannot drift: any future shape change lands in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the unreliable resolver-error probe used to detect background re-entry (idxcron ALTER REINDEX, ProcessInitSQL) with an explicit proc.Base.IsFrontend flag carried via executor.Options.WithFrontend. Default is background; frontend opts in at the two session-bound proc-construction sites (mysql client query handler and back_exec). BuildIdxcronMetadata, ddl.go AlterTableInplace re-registration, and the experimental_xxx_index gates in cagra/ivfpq/hnsw now consult ctx.IsFrontend() instead of probing a resolver — so background re- entry no longer clobbers captured task metadata or trips an experimental-flag check that already passed at CREATE INDEX time. The dead probe-based FrontendProbeVar / IdxcronFrontendProbeVar fields are removed in the same pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three comments in types.go and sqlexec.go still spoke of "IsBackground=true" / "WithIsBackground(false)" — relics of the prior name. Reworded to match the post-rename API (IsFrontend / WithFrontend) so the in-file docstrings line up with the code. No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add [plugin] / [isfrontend] tagged logutil.Info calls at each plugin lifecycle milestone so SQL-driven end-to-end tests can confirm via the CN log that the right algorithm's hook ran with the expected context. Covered points: - compile.handleCreate / HandleCreateIndex (cagra, ivfpq, ivfflat, hnsw): logs isFrontend / forceSync / def-count at entry — proves the per-algo gate and forceSync decision. - compile.HandleDropIndex (all four): logs entry on DROP INDEX. - compile.IdxcronMetadata (cagra, ivfpq, ivfflat): per-algo entry log pairs with the existing shared BuildIdxcronMetadata capture/skip [isfrontend] lines. - idxcron.Updatable (all four): logs every cron-tick decision. - iscp.NewIndexSqlWriter: single central log fires once per CDC consumer construction across all algos. - cuvs Sync.AppendRecords / Sync.Save (cagra, ivfpq): logs records IN from the CDC stream and OUT to the storage table, so flush cadence and chunk count are visible in the log. Smoke test files added for ivfflat and hnsw plugin/compile/ so the new log lines stay covered (ivfflat went 0% → 7.1%, hnsw 0% → 13.9%; cagra/ivfpq held at 79.6%). All other touched packages held or improved coverage. Build + vet clean on both default and gpu tag sets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…faultResolveVariable hook Two fixes plus a structural cleanup landed together. 1. runSqlWithOptions (pkg/sql/compile/compile.go) now propagates proc.Base.IsFrontend onto the executor.Options it spawns, mirroring the existing resolver propagation. Without this, sub-Compiles spawned for internal sub-SQL (ALTER TABLE COPY's CreateTmpTableSql in particular) defaulted to IsFrontend=false even when the outer caller was user-driven. Downstream code that gates on ctx.IsFrontend() / proc.Base.IsFrontend then silently misfired — notably CreateAllIndexUpdateTasks, which would receive metadata=nil from BuildIdxcronMetadata and write '' into mo_index_update's JSON column, tripping the BVT 'invalid input: json text' error on `ALTER TABLE tbl ADD c vecf32(3)` against an IVFFLAT-indexed table. 2. CreateAllIndexUpdateTasks (pkg/sql/compile/iscp_util.go) replaces the unreliable `GetResolveVariableFunc() == nil` "background" heuristic with `!c.proc.Base.IsFrontend`. Defensive belt-and- suspenders for any future sub-Compile path that doesn't propagate IsFrontend correctly — the audit found no other downstream consumer of the resolver-nil heuristic that breaks under the DefaultResolveVariable fallback, but this guards the empty-JSON regression at the registration site itself. 3. DefaultResolveVariable moves from pkg/iscp/sysvars.go (deleted) into pkg/util/executor/default_resolve_variable.go (new). All three consumers — pkg/frontend (writer), pkg/iscp (reader), pkg/sql/compile (reader) — already imported pkg/util/executor, so it's the lowest common ancestor with zero cycle risk. The hook now lives alongside Options.WithResolveVariableFunc and proc.Base.IsFrontend, which all turn on the same axis: "does this proc have a session-bound resolver?". Doc-comments in pkg/vm/process/types.go, pkg/sql/compile/sql_executor.go, and pkg/indexplugin/compile/hooks.go are updated; the wiring test moves to TestDefaultResolveVariableWired. Build + vet clean on default and gpu tag sets; pkg/frontend test TestDefaultResolveVariableWired passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a session sysvar `gpu_mode` that lets a -tags gpu binary route vector-index work (brute force, kmeans, adhoc brute force, pairwise distance) through the CPU fallback paths instead of cuvs CUDA. The build tag still drives the default: true under -tags gpu, false otherwise. An operator opts out per session via `SET gpu_mode = 0` to exercise CPU paths on the same binary for testing, benchmarking, or operator-controlled fallback. Implementation: - New leaf package pkg/util/gpumode/ declares `GpuMode bool` and two helpers: `EffectiveGpuMode(resolver)` (reads the sysvar via the proc's resolver, falls back to GpuMode) and `GpuModeDefaultInt8()` (the bool→int8 conversion the sysvar Default field needs). A //go:build gpu init() flips GpuMode to true; the non-gpu build relies on the zero value. - Six factory signatures grow a trailing `gpuMode bool` parameter: brute_force.{NewBruteForceIndex, NewAdhocBruteForceIndex, NewAdhocBruteForceIndexFlattened}, device.NewKMeans, metric.{PairWiseDistance, PairwiseDistanceLaunch}. The gpu.go bodies bail to the existing CPU bodies when !gpuMode; cpu.go variants accept-and-ignore the new param. - Four production callers compute the effective mode and pass it through: productl2.getIndex, ivfflat.LoadCentroids, ivf_create.clustering, and func_binary.batchArrayDistanceSync (which grew a proc parameter to reach the resolver from five SQL distance function entry points). - Sysvar registered in pkg/frontend/variables.go with Default = gpumode.GpuModeDefaultInt8() (read at variables.go init time, which runs after pkg/util/gpumode's init() so the default matches the binary's build tag). Adhoc brute force keeps its 5000-element CPU threshold; gpu_mode=true means "GPU dispatch is allowed," not "always GPU." gpu_mode=false skips the threshold entirely and goes straight to Usearch. Build + vet clean on both default and -tags gpu; gpumode package unit tests cover the nil resolver / int8 on/off / error / nil-value / unexpected-type paths plus the build-tag-driven init flip (90.9% default, 91.7% gpu). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CREATE TABLE … CLONE failed with `invalid input: BuildIdxcronMetadata: variable "ivf_threads_build" has unsupported type <nil>` when the clone spawned a sub-Compile whose session lookup returned (nil, nil) for a registered-but-not-session-set sysvar. The default branch of the type switch then errored on nil. Skip nil values instead — the idxcron consumer's task.Metadata.ResolveVariableFunc already falls back to its own compile-time default when a var isn't present in the captured blob, so skipping is the equivalent of "no captured value, use default." Same semantics as if the var weren't in the Capture list. New TestCagraIdxcronMetadata_NilValueSkipped covers the regression; existing _Frontend/_Background tests keep passing (83.2% coverage held). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Supersedes the per-variable nil-skip from ab48d0c with the proper fix: a FrontendProbeVar that gates the whole Capture pass on whether the inherited resolver can surface a known vector-index sysvar. The prior nil-skip treated the symptom (one var resolves to nil and the rest succeed → write a partial blob). The actual scenario is a sub-Compile spawned via runSqlWithOptions (e.g. CREATE TABLE CLONE) that inherits the frontend session's resolver AND IsFrontend=true, but the resolver is partial in that context — multiple captured vars silently return (nil, nil). Writing a partial metadata blob in that state risks emitting a structure the idxcron executor's task.Metadata.ResolveVariableFunc can't reason about correctly at firing time. The probe — a known per-algo sysvar that resolves cleanly in a true frontend session but returns nil in the partial sub-Compile — short- circuits the whole capture: BuildIdxcronMetadata returns (nil, nil) and the consumer falls back to compile-time defaults. Probe vars: - IVF-FLAT → "ivf_threads_search" - CAGRA → "cagra_threads_search" - IVF-PQ → "ivfpq_threads_search" Empty FrontendProbeVar means "no probe" — Capture is always resolved (used by plugins whose Capture is empty or who don't need the gate). TestCagraIdxcronMetadata_NilValueSkipped replaced by TestCagraIdxcronMetadata_ProbeFail covering the all-or-nothing semantics. Existing _Frontend/_Background tests keep passing (83.2% coverage held). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When proc.GetResolveVariableFunc() returns (nil, nil) for a system variable — the per-session sysvar map miss that happens when a sub-Compile (CREATE TABLE CLONE, internal-SQL with propagated session resolver, etc.) calls a sysvar that's registered in gSysVarsDefs but was never explicitly SET at the global/account level — fall through to executor.DefaultResolveVariable instead of returning nil. Background: ses.sesSysVars is a clone of the per-account snapshot from mo_mysql_compatibility_mode. Sysvars added to gSysVarsDefs without a corresponding catalog row are absent from the per-session map, and SystemVariables.Get returns interface{}(nil) on map miss — not the registered Default. Surfacing the gSysVarsDefs Default via executor.DefaultResolveVariable (already wired by pkg/frontend init) matches the per-var hardcoded-default fallback gpu_async_search used in getIvfflatMetadata. Net effect on the CLONE-table idxcron path: - BuildIdxcronMetadata's probe (ivf_threads_search) now resolves to int64(0) via the fallback instead of nil → probe gate passes → capture proceeds. - Each captured var (ivf_threads_build, kmeans_train_percent, …) follows the same path → metadata blob populated with the registered defaults rather than nil. - idxcron.RegisterUpdate gets non-empty JSON → mo_index_update accepts the row → cloned table has its idxcron task wired up with sensible defaults. The probe gate in BuildIdxcronMetadata stays in place as defensive belt-and-suspenders for test/unit paths where executor.DefaultResolveVariable isn't wired (no blank import of pkg/frontend). In production the fallback shadows it; in tests the gate short-circuits cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… map miss Session.GetSessionSysVar previously returned interface{}(nil) when ses.sesSysVars.Get(name) saw a map miss for a registered sysvar. sesSysVars is a clone of the per-account snapshot from mo_mysql_compatibility_mode; sysvars added to gSysVarsDefs without a corresponding catalog row are absent from the cloned map, and Get returns interface{}(nil) on map miss instead of the registered Default. That violates MySQL `SELECT @@name` semantics (session value > global default, never nil for a registered name) and breaks downstream consumers like sub-Compiles spawned by CREATE TABLE CLONE that try to read vector-index sysvars (ivf_threads_build, kmeans_train_percent, ...) — they receive nil and either fail or silently use zero values. The function already had a wholesale-nil fallback (`if ses.sesSysVars == nil { return gSysVarsDefs[name].Default }`); this commit extends it to cover the per-key map-miss case, the realistic scenario for any sysvar registered after the per-account snapshot was taken. New TestGetSessionSysVar_MapMissFallsBackToDefault asserts both `ivf_threads_build` (int64 default 0) and `kmeans_train_percent` (float64 default 10) resolve correctly when the per-session map is empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls in the upstream-targeted GetSessionSysVar fallback fix so that session sysvar lookups for vector-index vars (ivf_threads_build, kmeans_train_percent, …) return their gSysVarsDefs defaults instead of nil on a per-session-map miss. Together with the existing workarounds on this branch (resolveVariableOrDefault fallback, FrontendProbeVar gate, IsFrontend gates in iscp_util.go), CLONE-table idxcron registration now lands cleanly even before the upstream PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SCA flagged it as dead — defined once in cuvs_writer_test.go but never called from any test. Likely left over from an earlier test that was rewritten to not need a ConsumerInfo factory. Drop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…r id

Follow-up to the drop-index cache-eviction fix: HNSW's HandleDropIndex was still a no-op, so with the new dispatch its cached search index lingered until the 5-min VectorIndexCacheTTL (same leak as ivfpq/cagra/ivfflat). Evict via cache.Cache.Remove(storageDef.IndexTableName), mirroring the create-side. All four vector plugins now release on drop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…xes CLONE) CREATE TABLE ... CLONE of a table with a CAGRA/IVF-PQ vector index failed with "VECTOR column 'v' cannot be in index": indexColumnCheckKind mapped only IVFFLAT/HNSW (CAGRA/IVFPQ fell to "secondary"), and checkIndexColumnSupportability hardcoded the vector allowlist to ivfflat/hnsw and only matched f32/f64 (narrow f16/bf16/int8/uint8 fell through unvalidated). Delegate the vector-column check to the per-plugin catalog hook (catalog.SupportsVectorType / SupportedVectorTypes) so each algorithm's real supported element types are enforced: ivfflat = f32/f64/f16/bf16/int8/uint8, cagra/ivfpq = f32/f16, hnsw = f32/f64; non-vector index kinds reject vector columns. indexColumnCheckKind now maps cagra/ivfpq so Get() resolves the plugin. Verified: gpu_cases/vector BVT 100% (vector_clone_idxcron now 21/21) + unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

XuPeng-SH

I re-checked the current head and still see two substantive correctness issues in the cuVS quantizer path.

The 1-byte quantizer fallback still learns from already-quantized bytes.
In cgo/cuvs/index_base.hpp, train_quantizer_if_needed() still auto-trains from flattened_host_dataset after explicitly warning that this buffer may already hold int8/uint8 storage values when data came in through the public storage-typed constructors / add-chunk path. In that case the quantizer learns the compressed range, not the original float range, so later base-typed search/extend paths can silently quantize against the wrong min/max.

Suggestion: do not auto-train from storage-typed data. Require either an explicit quantizer/range or original base-typed training data before enabling base-typed search/extend on pre-quantized indexes.
The new “strided sample” still ignores the tail for 501–999 row builds.
With n_train = min(500, count) and stride = count / n_train, any 501 <= count < 1000 still collapses to stride == 1, so the sampling loop only visits rows 0..499. That means extrema in the tail are still missed, even though the comment now claims the sampler covers all rows.

Suggestion: choose indices proportionally across the full range (for example r = j * (count - 1) / (n_train - 1)) or switch to a true uniform/reservoir sampler.

I would keep this at request changes until those two are addressed, because both can directly bias quantization and search quality without any obvious runtime failure.

A WHERE predicate on a column not in the index INCLUDE list cannot be pushed into the GPU bitset; the planner runs the ANN search for a candidate window then JOINs+filters at the DB (post-filter). This path had no BVT coverage — all existing filter cases only filter on INCLUDE'd columns. Add vector_{cagra,ivfpq}_postfilter.sql: establish the unfiltered ranked result, then verify the post-filtered result equals exactly the unfiltered rows that satisfy the predicate (exact when LIMIT >= row count so the candidate window covers all rows), plus the mixed pre(INCLUDE)+post(non-INCLUDE) case and the small-LIMIT approximate-window case (far match falls outside the window). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aunjgr

LGTM for the quantization support. Well-structured across cuvs C++/CUDA layer, Go bindings, and SQL compilation.

For a 1-byte storage type that buffer only ever holds STORAGE bytes (raw T from a pre-quantized add_chunk(T*), or post-flush quantized output), never original floats. Training the scalar quantizer on it learns the COMPRESSED range (e.g. int8 [-128,127]) instead of the true float range, so later base-typed search/extend silently quantizes against the wrong min/max. Quantizer training now happens solely in flush_pending_float_chunks_internal() on the ORIGINAL floats buffered by add_chunk_float()/add_chunk_quantize(). A pre-quantized index leaves the quantizer untrained; base-typed search (quantize_query) and extend (upload_float_matrix_as_T) already throw "quantizer not trained", so the op fails loudly instead of mis-quantizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix Concurrent USearch add() could orphan HNSW graph nodes (a vector stored but never linked, so search() couldn't reach it -> flaky recall@1). MatrixOne matrixorigin#24849 worked around it by forcing single-threaded builds everywhere. The root cause is fixed upstream-style in our vendored libusearch (two-pass add: form ALL forward links before ANY reverse link, so a node is never reachable as a descent seed while a lower level is still empty), so the workaround is no longer needed. - thirdparties/usearch-2.25.3.tar.gz: patched index.hpp with the matrixorigin#735 fix (pristine v2.25.3 source, only index.hpp/test.cpp changed; CMakeLists still march=native so the Makefile's sed applies as before). - build.go: drop the hardcoded `nthread := 1`; restore the real concurrency estimate (GetConcurrency / GetConcurrencyForBuild from nworker/ThreadsBuild). - sync.go: CDC/sync paths use GetConcurrencyForBuild directly. - types.go: remove the GetConcurrencyForSingleThreadBuild stopgap. - zz_orphan_test.go: enable TestZZBuildOrphan as a regression guard — 30x 8-thread builds with the BVT t2 params (M 64, EF_CONSTRUCTION/SEARCH 200), rotating insertion order each run to mimic `load data ... parallel 'true'`; asserts 0 orphans. Auto-skips without the SIFT fixture (~5s when present). Validated: zz_orphan_test 0/30 multi-threaded (was ~1/30 pre-fix); 1M wiki_all HNSW build clean (recall@10 82% at M=8); vector_hnsw_async t2 BVT 30/30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

fix Concurrent USearch add() could orphan HNSW graph nodes (a vector stored but never linked, so search() couldn't reach it -> flaky recall@1). MatrixOne matrixorigin#24849 worked around it by forcing single-threaded builds everywhere. The root cause is fixed upstream-style in our vendored libusearch (two-pass add: form ALL forward links before ANY reverse link, so a node is never reachable as a descent seed while a lower level is still empty), so the workaround is no longer needed. - thirdparties/usearch-2.25.3.tar.gz: patched index.hpp with the matrixorigin#735 fix (pristine v2.25.3 source, only index.hpp/test.cpp changed; CMakeLists still march=native so the Makefile's sed applies as before). - build.go: drop the hardcoded `nthread := 1`; restore the real concurrency estimate (GetConcurrency / GetConcurrencyForBuild from nworker/ThreadsBuild). - sync.go: CDC/sync paths use GetConcurrencyForBuild directly. - types.go: remove the GetConcurrencyForSingleThreadBuild stopgap. - zz_orphan_test.go: enable TestZZBuildOrphan as a regression guard — 30x 8-thread builds with the BVT t2 params (M 64, EF_CONSTRUCTION/SEARCH 200), rotating insertion order each run to mimic `load data ... parallel 'true'`; asserts 0 orphans. Auto-skips without the SIFT fixture (~5s when present). - vector_hnsw_async.sql/.result: bump t2's post-build wait sleep(20)->sleep(30) so the async index is reliably visible before the NN query (the build's model becomes searchable a beat after sleep(20) under load — a visibility/timing flake, not an orphan). Validated: zz_orphan_test 0/30 multi-threaded (was ~1/30 pre-fix); 1M wiki_all HNSW build clean (recall@10 82% at M=8); vector_hnsw_async 5/5 at 100%. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…o usearch_build_fix

NewHnswBuild's nworker > 1 branch (GetConcurrencyForBuild / nworker) was never exercised — all existing tests passed nworker = 1 — leaving it uncovered and dropping PR coverage below the 75% gate. Add TestBuildMultiWorker which builds with nworker = 2 to hit that branch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ent build Restoring multi-threaded HnswBuild re-enabled two lifecycle races: Blocker 1 — lost worker errors + producer hang. A worker failing on the last queued vector failed after Add() already returned nil, and finalization never drained err_chan, so a corrupt build reported success. Add()'s enqueue was also an unconditional blocking send, so a full buffer blocked the producer forever once workers died. Replace the poll-once err_chan with a first-error record (workerErr) plus a `stopped` channel closed on the first failure or context cancellation. Add() now selects on the send vs <-stopped (can't block once workers are gone, surfaces the error); CloseAndWait() returns the recorded error; ToInsertSql()/Destroy() propagate it up through the existing hnsw_create error path. Blocker 2 — rollover saves/destroys an index with in-flight adds. A worker crossing IndexCapacity received the previous index as save_idx and called SaveToFile() (which saves AND destroys idx.Index) outside the lock, racing peer workers still doing idx.Add() on it (use-after-destroy / partial save; observed as "usearch index is nil"). Add a per-index in-flight WaitGroup on HnswModel: reserve the slot under the same lock that decides rollover (getIndexForAdd), release after the add, and Wait() it before SaveToFile(). The rolled-over index gets no new adds, so the wait converges; the crossing worker's own add targets the new index, so no self-deadlock. Regressions (both fail on the old code, pass with the fix, run under -race): - TestBuildMultiWorkerLastItemError: dim mismatch on the final queued vector must surface from ToInsertSql(). - TestBuildMultiWorkerRollover: small IndexCapacity + 8 workers + 1000 adds forcing ~50 rollovers; all keys survive and finalization succeeds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Conflict in the generated mysql_sql.go (both sides changed the grammar / keywords). Resolved mysql_sql.y and regenerated mysql_sql.go with goyacc; the regenerated parser is byte-identical to a fresh conflict-free build and all parser + frontend packages compile. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- go.mod: go directive 1.25.4 -> 1.26.4 (required by the goexperiment.simd build tag used by pkg/vectorindex/metric). - Makefile: on x86_64 the arch-specific SIMD kernels are now built by default via a single ARCHSIMD flag (default 1) -> GOAMD64=v3 GOEXPERIMENT=simd. Disable with `make ARCHSIMD=0 build`; GOAMD64 stays independently overridable (e.g. `make GOAMD64=v4 build`). The SIMD kernels runtime-dispatch (AVX-512 -> AVX2 -> scalar), so GOAMD64=v3 only sets the portable baseline floor (Haswell-class), not the vector path. - Dockerfiles: bump base images to the prepared 1.26.4 tags (matrixorigin/golang:1.26.4-ubuntu22.04, matrixorigin/tester:go1.26.4-jdk8); `make build` inside the image picks up the new SIMD default automatically. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sonic v1.15.0 fails to compile under Go 1.26. Bump to v1.15.2 (and its loader to v0.5.1), the latest release, which builds cleanly with go1.26.4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The arch-specific SIMD distance kernels (FMA) make an exact-match cosine distance 1.1e-16 instead of 0.0. mo-tester's comparator treats 0-vs-nonzero as a hard mismatch (no tolerance), so the UNION mode=pre case in vector_ivf_mode.sql failed under the archsimd build. Wrap the projected distances in round(dist,4) (drives the 1.1e-16 cell to exactly 0; round() wraps only the projection so the ORDER BY keeps the raw distance and the ivfflat index is still used -- verified via EXPLAIN), and change the outer ORDER BY id -> ORDER BY dist, id to give the UNION a deterministic row order. Update the .result to match. vector_ivf_mode.sql 87/87 and vector_ivf_mode_advanced.sql 41/41 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… ivfpq/cagra (Go) CAGRA had no quantize test at all (ivf_pq already covered half->int8/uint8). Add GpuCagraTest::HalfQuantizeToInt8Build / HalfQuantizeToUint8Build mirroring ivf_pq: train the native half-source scalar quantizer, transform half->int8/uint8, build a CAGRA graph over the codes, and search with both a native-T query and a half query routed through quantize_query. Verified on GPU (169/169 cuvs tests pass). Add pkg/cuvs/search_f16quant_test.go (gpu tag): the Float16->int8 / Float16->uint8 build+search path for IVF-PQ and CAGRA via AddChunkQuantize/SearchQuantize, the combo the float32-base info_test matrix did not exercise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…crashing) The C constructors gpu_{cagra,ivf_flat,ivf_pq}_new[_empty|_from_data_file| _load_file] take (quantization_t btype, quantization_t qtype) since the (btype,qtype) dispatch landed, but the python ctypes binding still passed a single quantization int — every argument after it was misaligned, so CagraIndex/IvfPqIndex/IvfFlatIndex.create() SIGSEGV'd (test_cagra core-dumped; brute_force already carried btype and passed). Add the btype c_int to the 10 construct argtypes and a btype=Quantization.F32 parameter to the create/create_empty/load_file methods, passing int(btype), int(qtype). brute_force/kmeans untouched. All 12 python tests pass again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Build on the btype construct fix: carry the base type through the python binding so a vecf16 base actually works. Add _np_dtype_for() (Quantization -> numpy dtype); CagraIndex/IvfPqIndex.create() now build the dataset in the base dtype (float16 for btype=F16, not coerced to float32) and remember btype on the index; search()/train_quantizer() coerce base-typed buffers to that dtype. Add test_cagra_f16_quantize / test_ivf_pq_f16_quantize: vecf16 base quantized to int8 and uint8 via the native half-source quantizer, build + search. All 14 python tests pass on GPU. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n GPU) Correct CagraBuildParams.AttachDatasetOnBuild and CagraSearchParams.ItopkSize (the committed names AddDataOnBuild/ITopKSize don't exist, so the gpu build failed), and scope the check to "build+search returns k valid neighbors" instead of an exact self-match — quantized recall is covered by the C++ Int8VsUint8SignedDataHalf test. TestGpuF16QuantizeAll now passes on GPU for IVF-PQ and CAGRA x {int8, uint8}. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the earlier shape-only check with a meaningful correctness assertion: each probe is an exact copy of a stored row, so a working quantized search must return its id in the top-k for >=80% of probes (a broken search scores ~0). Use Default{IvfPq,Cagra}{Build,Search}Params and override only NLists, instead of struct literals. A struct literal zero-defaults omitted fields -- in particular IvfPqBuildParams.KmeansTrainsetFraction=0 means no kmeans training, degenerate IVF centroids, and near-zero recall (identical for f32 and f16, so not a type bug). Diagnosed by comparing f32->int8, f16->int8 and native f32->f32, which all returned the same neighbours. Verified on GPU: IVF-PQ and CAGRA x {int8, uint8} all pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The f16 base + native int8/uint8 quantization work the plan described is done and verified (C++/Go/python tests + existing SQL BVT); drop the planning doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cpegeric and others added 30 commits May 20, 2026 11:46

Merge branch 'iscp_resolve_variable' into gpu_plugin_cuvs

ecb1845

ivfpq_create sync

2a62d76

Merge branch 'gpu_async_search' into gpu_plugin_all

bb80c0d

Merge branch 'gpu_plugin_all' into gpu_plugin_logging

e83bdd3

more log

3c9f93c

bug fix cuvs cdc

de7321c

Merge branch 'main' into gpu_async_search

0765e3a

Merge branch 'gpu_async_search' into gpu_plugin_all

b035180

max overflow size for effective brute force index

005993b

fix UT

adcf739

fix: HLC is ahead of wall clock. iscp failed to update

d7df4c7

fix avoid neighbour MAX_INT32 junk and return -1 for invalid neighbou…

a778880

…r id

bug fix REINDEX with options

73c9720

fix ivfpq/cagra idxcron

e3d5984

cpegeric temporarily deployed to ci June 24, 2026 14:06 — with GitHub Actions Inactive

XuPeng-SH requested changes Jun 24, 2026

View reviewed changes

heni02 approved these changes Jun 25, 2026

View reviewed changes

aunjgr approved these changes Jun 25, 2026

View reviewed changes

cpegeric and others added 14 commits June 25, 2026 13:04

Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…

ff5ae42

…o usearch_build_fix

Merge branch 'main' into usearch_build_fix

24afe65

Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…

12e13a1

…o usearch_build_fix

Merge branch 'usearch_build_fix' of github.com:cpegeric/matrixone int…

2a267aa

…o usearch_build_fix

Merge branch 'usearch_build_fix' into cuvs_quantize

0bc7f86

Merge branch 'usearch_build_fix' into cuvs_quantize

ea5707d

build: bump bytedance/sonic to v1.15.2 for Go 1.26 compatibility

c143b7e

sonic v1.15.0 fails to compile under Go 1.26. Bump to v1.15.2 (and its loader to v0.5.1), the latest release, which builds cleanly with go1.26.4. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

gouhongshen approved these changes Jun 26, 2026

View reviewed changes

cpegeric and others added 8 commits June 26, 2026 11:52

docs: remove cuvs_float16.md plan (implementation complete)

0e03432

The f16 base + native int8/uint8 quantization work the plan described is done and verified (C++/Go/python tests + existing SQL BVT); drop the planning doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge branch 'main' into cuvs_quantize

1725ba2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095

IVFFLAT/IVFPQ/CAGRA support bf16, float16, int8 and uint8 quantization#25095
cpegeric wants to merge 895 commits into
matrixorigin:mainfrom
cpegeric:cuvs_quantize

cpegeric commented Jun 23, 2026 •

edited

Loading

Uh oh!

XuPeng-SH left a comment

Uh oh!

aunjgr left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

cpegeric commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

XuPeng-SH left a comment

Choose a reason for hiding this comment

Uh oh!

aunjgr left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

cpegeric commented Jun 23, 2026 •

edited

Loading